14 research outputs found

    Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

    Get PDF
    The high performance of FPGA (Field Programmable Gate Array) in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC) paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs) and parallel stochastic bitstream processing, which are well suited to leverage FPGA\u27s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively

    Energy-Performance Scalability Analysis of a Novel Quasi-Stochastic Computing Approach

    Get PDF
    Stochastic computing (SC) is an emerging low-cost computation paradigm for efficient approximation. It processes data in forms of probabilities and offers excellent progressive accuracy. Since SC\u27s accuracy heavily depends on the stochastic bitstream length, generating acceptable approximate results while minimizing the bitstream length is one of the major challenges in SC, as energy consumption tends to linearly increase with bitstream length. To address this issue, a novel energy-performance scalable approach based on quasi-stochastic number generators is proposed and validated in this work. Compared to conventional approaches, the proposed methodology utilizes a novel algorithm to estimate the computation time based on the accuracy. The proposed methodology is tested and verified on a stochastic edge detection circuit to showcase its viability. Results prove that the proposed approach offers a 12—60% reduction in execution time and a 12—78% decrease in the energy consumption relative to the conventional counterpart. This excellent scalability between energy and performance could be potentially beneficial to certain application domains such as image processing and machine learning, where power and time-efficient approximation is desired

    Energy-Performance Scalability Analysis of a Novel Quasi-Stochastic Computing Approach

    No full text
    Stochastic computing (SC) is an emerging low-cost computation paradigm for efficient approximation. It processes data in forms of probabilities and offers excellent progressive accuracy. Since SC’s accuracy heavily depends on the stochastic bitstream length, generating acceptable approximate results while minimizing the bitstream length is one of the major challenges in SC, as energy consumption tends to linearly increase with bitstream length. To address this issue, a novel energy-performance scalable approach based on quasi-stochastic number generators is proposed and validated in this work. Compared to conventional approaches, the proposed methodology utilizes a novel algorithm to estimate the computation time based on the accuracy. The proposed methodology is tested and verified on a stochastic edge detection circuit to showcase its viability. Results prove that the proposed approach offers a 12–60% reduction in execution time and a 12–78% decrease in the energy consumption relative to the conventional counterpart. This excellent scalability between energy and performance could be potentially beneficial to certain application domains such as image processing and machine learning, where power and time-efficient approximation is desired

    Novel approaches for reliable and efficient circuit design

    Get PDF
    In this research work, a suite of approaches are presented to improve reliability of 3D heterogeneous processors (3DHP) and to reduce the area overhead of asynchronous designs. This work is primarily divided into two parts. In the first part, we present an approach for improving reliability in 3DHP. Typically, in 3DHP, thermal hotspots introduce spatial and temporal variability that results in wide bit error variation in DRAM dies. To address this issue multi- path BCH decoder is introduced. Based on the thermal gradient data generated by on-chip temperature sensors, the proposed methodology specializes in adaptively estimating the number of errors in the incoming word and also selecting the fast decoding path to correct these errors. Thus, provides DRAM error protection with minimal decoding latency. In the next part of this work, we focus on reducing the area overhead of asynchronous paradigm-driven null convention logic (NCL) design using Gate Diffusion Input (GDI). We first develop technique for realizing NCL gates. In the process, we demonstrate that there is a voltage swing at the output that may introduces errors. To address this limitation, a HYBRID approach is introduced where conventional complementary metal oxide semiconductor (CMOS) technology is integrated with GDI methodology. With this approach, we demonstrate that we can reduce the transistor count (TC) of the NCL designs while addressing the limitations due to voltage drop. To further reduce the TC of the NCL designs, GNCL is developed. This approach utilizes the regenerative buffers to overcome the performance degradation and also reduce the area overhead. Overall in this dissertation, we demonstrate reductions in area and power overheads for asynchronous designs --Abstract, page iv

    Novel Area-Efficient Null Convention Logic based on CMOS and Gate Diffusion Input (GDI) Hybrid

    No full text
    Null convention logic (NCL) is a promising delay insensitive paradigm for constructing asynchronous circuits. Traditionally, NCL circuits are implemented utilizing complementary metal oxide semiconductor (CMOS) technology that has large area overhead. To address this issue, a HYBRID methodology is introduced for realizing NCL circuits in this paper. The proposed approach utilizes both CMOS and gate diffusion input (GDI) techniques to significantly reduce the area. Compared with the conventional static CMOS NCL counterpart, the HYBRID implementation of an NCL up counter demonstrate an average of 10% reduction in the transistor count

    Multi-Stage BCH Decoder to Mitigate Hotspot-Induced Bit Error Variation

    No full text
    3D heterogeneous integration (commonly termed as 3DIC) of CPU, GPU and DRAM dies vertically interconnected by a massive number of TSVs (Through-Silicon Vias) is expected to overcome limited bandwidth, high latency and energy consumption of off-chip DRAM. However, spatial and temporal variability in temperature (i.e., hotspots) is anticipated to result in bit error variation in DRAM die. A novel multi-stage BCH decoder has been proposed to efficiently address this issue in this work. The proposed multi-stage BCH decoder is designed to tolerate up to a certain maximum number of error bits per codeword, which is estimated from the on-line thermal gradient data, to minimize the decoding latency

    Area Efficient Multi-Threshold Null Convenction Logic

    No full text
    Multi-Threshold null convention logic (MTNCL) is a commonly used asynchronous paradigm for designing low power NCL circuits. Traditionally, MTNCL circuits implemented using complementary metal oxide semiconductor (CMOS) technique that tends to occupy a large area. To address this limitation, a gate diffusion input (GDI) methodology is introduced for implementing MTNCL circuits. This GDI technique enables complex logic to be implemented using only two transistors that helps to reduce area utilization. In this paper, a novel approach to implement MTNCL designs based GDI methodology is proposed. The proposed approach has been verified by implementing TH23 MTNCL gate. Comparing to the conventional CMOS implementation, the proposed approach shows a 45% reduction in the area overhead

    Optimization of Null Convenction Logic using Gate Diffusion Input

    No full text
    Null convention logic is a commonly used delay insensitive paradigm for designing asynchronous circuits. Traditionally, NCL circuits are implemented using static complementary metal oxide semiconductor (CMOS) technology that tends to have large area overhead. To address this issue, a gate diffusion input (GDI) methodology is introduced for realizing NCL circuits. This GDI is a low-power design approach that uses only two transistors to design complex circuits. By using this design technique, a significant reduction area utilization was observed at the expense of latency overhead. To address this limitation, a novel design approach based on GDI methodology is proposed in this paper. The proposed fast GDI (FGDI) approach uses GDI functions F1 and F2 to reduce latency without affecting performance. To evaluate the performance of the FGDI technique, a one-bit full adder was realized in Cadence virtuoso 45nm technology. Compared to GDI implementation, FGDI approach shows a 76% reduction in the latency

    Low-Power Null Convention Logic Multiplier Design based on Gate Diffusion Input Technique

    No full text
    The increasing power consumption in the synchronous circuits is the major concern in the semiconductor industry. The major contributor to this power consumption is the clock generator and the clock distribution. This problem can be addressed by using the asynchronous circuits. Null Convention Logic (NCL) is one of the most commonly known delay insensitive approach for designing asynchronous designs. However, realizing the NCL circuits using the commonly used complementary metal oxide semiconductor (CMOS) technique is said to increase the area and the power consumption. The low power design technique known as Gate Diffusion Input (GDI) can be used for implementing the NCL circuits to reduce both the area and the power. Application of the external input to the sources of the pMOS and nMOS transistors, allows to reduces the area and the dynamic switching. Thus, decreasing the transistor count and the power. The proposed GDI NCL technique is used for designing the 4-bit un-pipelined NCL multiplier. The design was realized and simulated in gpdk045 Cadence Virtuoso. In comparison to the CMOS model, the GDI model shows 21.6 % in transistor count and the dynamic power is reduced by 13.7 %

    Low-Power Null Convention Logic Design Based on Modified Gate Diffusion Input Technique

    No full text
    Null Convention Logic (NCL) is the one of the well-known clock-less approaches for designing asynchronous logic circuits. The complementary metal oxide semiconductor (CMOS) technology is usually used for implementing the NCL circuits, which as a major drawback of large area consumption and power dissipation. These limitations have been addressed by adopting a low-power design technique called Gate Diffusion Input (GDI) in this work. GDI technique allows implementing primitive logic gates using only two transistors. Thus, it not only reduces the transistor count but also the power consumption. However, GDI technique suffers a significant voltage drop across the circuit, due to its inherent voltage swings. Thus, to ensure full swing output regenerative buffers are added at the output stage which tends to increase the overall latency. In this work, a novel GDI and HYBRID (CMOS+GDI) designs are proposed to overcome the limitations of the CMOS-NCL designs. The proposed approaches were tested by realizing NCL Ripple Carry Adder (RCA). The proposed model was simulated in Cadence Virtuoso and power reduction of 14.9 % and 9.8 % has been observedfor GDI and HYBRID models, respectively
    corecore